Visualization¶
In this final lab, we'll see how to make some advanced plots, changing pallettes, and leverage the top two packages for this purpose: Matplotlib and plot.ly. We'll also enhance our visualization methods using Seaborn.
Matplotlib, Pyplot and Seaborn¶
The best known package for plotting is Matplotlib, which imports the Matlab plot syntaxis into python. This is mostly done through its subpackage pyplot. It is very well documented, I suggest you go through the tutorial yourself!
We have used many times the pyplot package, so we will focus mostly on some more advanced plots that we have not seen before, and in ways to stile and use our plots more effectively.
The basic pyplot structure involves:
- Define an empty plot.
- Define the data and the basic structure.
- Add details (labels, axes, marks, sizes...)
- Display the plot.
However, there are so many types of plots that the explanation above is too simplistic. Much better choice is to start using a few plots and playing around with them.
For this example we'll plot a violin plot, which is a great way to see distributions.
Let's import all packages we need.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
%matplotlib inline
A violin plot is a type of box plot which simultaneously shows distribution and dispersion. For this, we'll also use matplotlibs companion package seaborn, which simplifies some plots.
Warning: Seaborn can be very restrictive. Sometimes is much better to simply use pyplot and use seaborn to style the plot.
We will plot the honorable dataset iris, the best known dataset in pattern recognition. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
# Load dataframe.
df = sns.load_dataset('iris')
# Violin plot
ax = sns.violinplot( x=df["species"], y=df["sepal_length"], palette="Blues")
Let's alter the plot a bit. Seaborn allows using "styles". Each style will give the plot a completely different look. The five basic styles are darkgrid, whitegrid, dark, white, and ticks, and each can be set with the function set_style.
sns.set_style("darkgrid")
ax = sns.violinplot( x=df["species"], y=df["sepal_length"], palette="Blues")
sns.set_style("dark")
ax = sns.violinplot( x=df["species"], y=df["sepal_length"], palette="Blues")
What if we want to focus on a particular group? We can do this by simply assigning a pallete colour for each element.
my_pal = {"versicolor": "g", "setosa": "lightgrey", "virginica":"lightgrey"}
ax = sns.violinplot( x=df["species"], y=df["sepal_length"], palette=my_pal)
There are many colours we can use, the full list is here.
We can control a whole set of parameters in seaborn. To see the full list, run the following
sns.axes_style()
{'axes.axisbelow': True,
'axes.edgecolor': 'white',
'axes.facecolor': '#EAEAF2',
'axes.grid': False,
'axes.labelcolor': '.15',
'axes.spines.bottom': True,
'axes.spines.left': True,
'axes.spines.right': True,
'axes.spines.top': True,
'figure.facecolor': 'white',
'font.family': ['sans-serif'],
'font.sans-serif': ['Arial',
'DejaVu Sans',
'Liberation Sans',
'Bitstream Vera Sans',
'sans-serif'],
'grid.color': 'white',
'grid.linestyle': '-',
'image.cmap': 'rocket',
'lines.solid_capstyle': 'round',
'patch.edgecolor': 'w',
'patch.force_edgecolor': True,
'text.color': '.15',
'xtick.bottom': False,
'xtick.color': '.15',
'xtick.direction': 'out',
'xtick.top': False,
'ytick.color': '.15',
'ytick.direction': 'out',
'ytick.left': False,
'ytick.right': False}
To set any of these, run the set_style function, and give it a dictionary with which parameters you would like to alter.
sns.set_style("darkgrid", {"axes.grid": "True"})
ax = sns.violinplot( x=df["species"], y=df["sepal_length"], palette=my_pal)
Seaborn also comes with "contexts" which are predefined parameters refering to a certain potential use of the plot. These are paper, notebook, talk, and poster.
sns.set_context("paper")
ax = sns.violinplot( x=df["species"], y=df["sepal_length"], palette=my_pal)
ax.set_title("Paper context")
Text(0.5, 1.0, 'Paper context')
sns.set_context("talk")
ax = sns.violinplot( x=df["species"], y=df["sepal_length"], palette=my_pal)
ax.set_title("Talk context")
Text(0.5, 1.0, 'Talk context')
sns.set_context("poster")
ax = sns.violinplot( x=df["species"], y=df["sepal_length"], palette=my_pal)
ax.set_title("Poster context")
Text(0.5, 1.0, 'Poster context')
You can again finetune these parameters using the function set or providing a dictionary.
Multiple figures¶
Seaborn returns a matplotlib "axis". We can manipulate these axis and change multiple properties. The list is exhaustive, it is much better to just experiment with them.
For example, let's plot the violin plots for the whole dataset, in a square plot. Seaborn provides a grid plot, which will try to create a list of the different elements and plot to each one. See the axis grid tutorial for more details. However, the more detailed control comes from pyplot. The following plot will create a plot per axis, for each variable.
sns.set_context("notebook")
sns.set_style("dark")
# Subplots, in a square shape
fig, axs = plt.subplots(nrows = 2, ncols = 2, figsize=(10,10))
sns.violinplot( x=df["species"], y=df["sepal_length"], palette="Blues", ax=axs[0,0])
sns.violinplot( x=df["species"], y=df["sepal_width"], palette="Blues", ax=axs[0,1])
sns.violinplot( x=df["species"], y=df["petal_length"], palette="Blues", ax=axs[1,0])
sns.violinplot( x=df["species"], y=df["petal_width"], palette="Blues", ax=axs[1,1])
plt.show()
Two dimensional plots.¶
A final plot we'll see is the kernel density estimator, which allows visualizing densities. A simple one will plot two variables for all classes.
sns.jointplot(x="sepal_length", y="sepal_width", data=df, kind="kde");
A much more sophisticated plot can use the KDE density estimator, for all variables at once. It requires a pairgrid.
sns.set_style("white")
# Define a grid for each element in the dataframe.
g = sns.PairGrid(df)
# Apply seaborn function kdeplot for each level for the diagonal.
g.map_diag(sns.kdeplot)
# Apply function off the diagonal
g.map_offdiag(sns.kdeplot, n_levels=6);
This is just a taster of the many plots you can create. We can create many plots using matplotlib, seaborn and all the subpackages available. Go through the Python Plot Gallery to get inspired!
Plot.ly¶
Plot.ly is a commercial package with an open source component aimed at supporting professional interactive dashboarbs. It is a very powerful package, thus it is a bit more complicated to use than Matplotlib.
First, in order to use the package, we need to import a few lines of code which will tell colab what does it need in order to render the plot correctly. We'll define a function which will simply tell colab "this is how you plot this".
import plotly.offline as py
import pandas as pd
import plotly.graph_objs as go
def enable_plotly_in_cell():
import IPython
from plotly.offline import init_notebook_mode
# Import a set of Javascript functions which will help
display(IPython.core.display.HTML('''
<script src="/static/components/requirejs/require.js"></script>
'''))
init_notebook_mode(connected=False)
Normally, you would need to run this in every cell where you call a plotly plot. We can automatically call this in every cell by adding a "hook", which will call the function in each cell. The following code does so.
# Run function automatically in every cell.
get_ipython().events.register('pre_run_cell', enable_plotly_in_cell)
We can now plot something. Plots in plotly can become really complex, such as this worldmap of the world GPD in 2014 on a plot called a Choropleth Map. This particular one is stolen from Plotly's tutorial.
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2014_world_gdp_with_codes.csv')
data = [go.Choropleth(
locations = df['CODE'],
z = df['GDP (BILLIONS)'],
text = df['COUNTRY'],
colorscale = [
[0, "rgb(5, 10, 172)"],
[0.35, "rgb(40, 60, 190)"],
[0.5, "rgb(70, 100, 245)"],
[0.6, "rgb(90, 120, 245)"],
[0.7, "rgb(106, 137, 247)"],
[1, "rgb(220, 220, 220)"]
],
autocolorscale = False,
reversescale = True,
marker = go.choropleth.Marker(
line = go.choropleth.marker.Line(
color = 'rgb(180,180,180)',
width = 0.5
)),
colorbar = go.choropleth.ColorBar(
tickprefix = '$',
title = 'GDP<br>Billions US$'),
)]
layout = go.Layout(
title = go.layout.Title(
text = '2014 Global GDP'
),
geo = go.layout.Geo(
showframe = False,
showcoastlines = False,
projection = go.layout.geo.Projection(
type = 'equirectangular'
)
),
annotations = [go.layout.Annotation(
x = 0.55,
y = 0.1,
xref = 'paper',
yref = 'paper',
text = 'Source: <a href="https://www.cia.gov/library/publications/the-world-factbook/fields/2195.html">\
CIA World Factbook</a>',
showarrow = False
)]
)
fig = go.Figure(data = data, layout = layout)
py.iplot(fig, filename = 'd3-world-map')
Billions US$'), )] layout = go.Layout( title = go.layout.Title( text = '2014 Global GDP' ), geo = go.layout.Geo( showframe = False, showcoastlines = False, projection = go.layout.geo.Projection( type = 'equirectangular' ) ), annotations = [go.layout.Annotation( x = 0.55, y = 0.1, xref = 'paper', yref = 'paper', text = 'Source: \ CIA World Factbook', showarrow = False )] ) fig = go.Figure(data = data, layout = layout) py.iplot(fig, filename = 'd3-world-map')